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Abstract Smart well technologies, which allow remote 
control of well and production processes, make the prob¬ 
lem of determining optimal control strategies a timely 
endeavour. In this paper, we use numerical optimiza¬ 
tion algorithms and a multiscale approach in order to 
find an optimal well management strategy over the life 
of the reservoir. Optimality is measured in terms of the 
values of the net present value objective function. The 
large number of well rates for each control step make 
the optimization problem more difficult and at a high 
risk of achieving a suboptimal solution. Moreover, the 
optimal number of adjustments is not known a priori. 
Adjusting well controls too frequently will increase un¬ 
necessary well management and operation cost, and an 
excessively low number of control adjustments may not 
be enough to obtain a good yield. We investigate three 
derivative-free optimization algorithms, chosen for their 
robust and parallel nature, to determine optimal well 
control strategies. The algorithms chosen include gener¬ 
alized pattern search (GPS), particle swarm optimiza- 
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tion (PSO) and covariance matrix adaptation evolution 
strategy (CMA-ES). These three algorithms encompass 
the breadth of available black-box optimization strate¬ 
gies: deterministic local search, stochastic global search 
and stochastic local search. In addition, we hybridize 
the three derivative-free algorithms with a multiscale 
regularization approach. Starting with a reasonably small 
number of control steps, the control intervals are sub¬ 
sequently refined during the optimization. Results for 
experiments studied indicate that CMA-ES performs 
best among the three algorithms in solving both small 
and large scale problems. When hybridized with a mul¬ 
tiscale regularization approach, the ability to find the 
optimal solution is further enhanced, with the perfor¬ 
mance of GPS improving the most. Topics affecting the 
performance of the multiscale approach are discussed 
in this paper, including the effect of control frequency 
on the well control problem. The parameter settings 
for GPS, PSO, and CMA-ES, within the multiscale ap¬ 
proach are considered. 

Keywords Well Control • Production Optimization • 
Derivative-Free Algorithms • Multiscale Approach 

1 Introduction 

Determining the well production and injection rates is 
of paramount importance in modern reservoir develop¬ 
ment. The decision is difficult since the optimal rates 
depend on the heterogeneity of the rock and liquids, 
the well placements and other parameters. Indeed, these 
properties and input parameters are coupled in a highly 
nonlinear fashion. Moreover, the optimal production 
and injection rates are usually not constant throughout 
the life cycle of reservoir. The oil saturation distribu¬ 
tion changes during the well injection and production 




2 


Xiang Wang et al. 


processes. This will then affect the optimal production 
and injection rate for each well. 

Well control planning can be formulated as an op¬ 
timization problem, using economic or cumulative oil 
production as the objective function. The well rates or 
bottom hole pressures at different times are the opti¬ 
mization variables. Many optimization algorithms have 
been investigated to solve such problems. These algo¬ 
rithms can be broadly placed in two categories: derivative- 
based algorithms and derivative-free algorithms [25]. 

Derivative-based or gradient-based algorithms, take 
advantage of the gradient information to guide their 
search. This type of algorithm, commonly used in well 
control optimization, includes steepest ascent |42| , con¬ 
jugate gradient [5], and sequential quadratic program¬ 
ming methods [53]. Gradients of the objective function 
may be calculated by using an adjoint equation. This is 
an invasive approach, requiring a detailed knowledge 
of mathematics inside the reservoir simulator [251 l8| . 
Other ways to approximate the gradients include meth¬ 
ods such as finite difference perturbation [2], or the si¬ 
multaneous perturbation stochastic approximation |42j . 
These algorithms assume a certain degree of smooth¬ 
ness of the objective function with respect to the opti¬ 
mization variables. Derivative-based algorithms are po¬ 
tentially very quick to converge but sometimes fall into 
local optimal. 

Derivative-free algorithms can be subdivided into 
local search methods and global search methods. Lo¬ 
cal derivative-free algorithms include generalized pat¬ 
tern search (GPS) |13|, mesh adaptive direct search 
(MADS) US], Hooke-Jeeves direct search (HJDS) pS] . 
ensemble-based optimization (EnOpt) [32] . covariance 
matrix adaptation evolution strategy (GMA-ES) [71I3T]. 
and so on. These methods have strong ability to find ac¬ 
curate optima in a local space, but may face with some 
difficulties in finding global optima, especially when a 
good initial guess is not available. Global derivative- 
free algorithms search through the entire space and 
provide techniques to avoid being trapped in local op¬ 
tima. Examples of global search algorithms include ge¬ 
netic algorithms (GAs) [5|, particle swarm optimization 
(PSO) pS], and differential evolution (DE) |5^[5]. Al¬ 
though these algorithms are robust and easy to use, 
they often require more function evaluations than lo¬ 
cal search and derivative-based algorithms to converge. 
However, most of these algorithms parallelize naturally 
and easily, which make their efficiency satisfactory [101 . 
Recently, some hybridization of these techniques such 
as PS0-MADS [231E31[^ . multilevel coordinate search 
(MGS) Plj, etc., are developed and applied in well 
placement and/or well control optimization problems. 
These methods provide global search capabilities in ad¬ 


dition to local convergence. The performance of MGS 
for well placement and control optimization is discussed 
in [33] • 

The optimization algorithms mentioned above can 
be further classified as either stochastic or determinis¬ 
tic. Stochastic methods use information from the pre¬ 
vious iteration and a random element to generate new 
search points. The random component of the algorithm 
makes it more likely to avoid local optima, but it may 
also make the control of solution quality difficult, espe¬ 
cially with a limited computational budget. The stochas¬ 
tic algorithms from the above list are MADS, GMA-ES, 
GA, PSO, and DE. Deterministic methods have no ran¬ 
dom element. For a given problem, deterministic meth¬ 
ods will give the same results for each trial. GPS, HJDS, 
and MGS are examples of deterministic algorithms. 

All the above mentioned algorithms have been used 
in well control optimization and/or well placement opti¬ 
mization problems. The performance of the algorithms 
are problem-dependent. Some of the algorithms, like 
GPS (1960s), GA (1960s), and PSO (1990s), have been 
around for decades, and have been used in petroleum 
industrial problems for a relatively long time. People 
has accumulated a great deal of experience through 
case studies. GMA-ES, developed in 2000s [TS], was 
first used in petroleum related optimization problems 
only in 2012 [7]. Though GMA-ES showed great perfor¬ 
mance in solving well placement optimization problem 
[7] ; to date there has been little work to apply it in well 
control problems. 

Although many optimization algorithms have been 
used, well control optimization is still challenging prob¬ 
lem and an active area of research. The number of op¬ 
timization variables is large in many real-life scenar¬ 
ios. The required number of function evaluations will 
rise sharply with the increase of the number of vari¬ 
ables. A single function evaluation requires one reser¬ 
voir simulation which is often very demanding in terms 
of GPU time. The non-convex, non-smooth and multi¬ 
modal objective surface further increases the optimiza¬ 
tion difficulty. 

It is difficult to find a reasonable frequency for well 
control; an excessively low number of control adjust¬ 
ments may not truly optimize oil recovery. Adjusting 
each well control too frequently imposes an unrealistic 
control burden on operations, increasing the total well 
management cost. Moreover, imposing a high number 
of control adjustments increases the complexity of the 
optimization problem so much that there is a high risk 
of optimization algorithms becoming trapped at local 
optima and hence missing the optimal startegy |38j . 
Multiscale regularization approaches are developed to 
address these problems. The main idea of the multi- 
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scale approach is to start the optimization process with 
a very coarse control frequency (and thus, with a small 
number of control variables) and refine successively. 
The solution at the coarse-scale is used as the initial 
guess of controls for the next finer scale optimization 

m- 

Lien et al. m used an adaptive multiscale regu¬ 
larization approach with gradient-based algorithms for 
well control optimization. The number and time of con¬ 
trol adjustments are chosen based on heuristically de¬ 
fined refinement indicators, which are calculated by us¬ 
ing gradients of the objective function. Shuai et al. [35] 
applied ordinary multiscale regularization, also called 
a successive-splitting multiscale approach, to find ap¬ 
propriate frequencies for well control adjustment. The 
optimization starts with a coarse number of control ad¬ 
justments and subsequently splits each control step into 
two new ones at every iteration. Two optimization al¬ 
gorithms are considered in their works: ensemble-based 
optimization (EnOpt) [321138] and bound optimization 
by quadratic approximation (BOBYQA) [37] • More re¬ 
cently, Oliveira et al. [53| provide an adaptive hierar¬ 
chical multiscale approach with EnOpt and an adjoint 
method for estimation of optimal well controls. The 
number and lengths of controls are selected adaptively 
by splitting/merging control steps on the optimization 
proceeds. Most of the algorithms are gradient-based. 
The performance and suitability of multiscale methods 
combined with derivative-free algorithms for well con¬ 
trol optimization still needs further attention and is the 
focus of this paper. 

Given the prevalence of the use of derivative-free op¬ 
timization algorithms for well control optimization and 
the need for a multiscale approach for problems with a 
large number of control variables, we consider the nat¬ 
ural marriage of the two philosophies. In this paper, 
we combine a multiscale framework with three typi¬ 
cal derivative-free optimization algorithms for the well 
control problem. We choose a deterministic local search 
method - generalized pattern search (GPS), a stochas¬ 
tic local search method ~ covariance matrix adaptation 
evolution strategy (CMA-ES), and a stochastic global 
search method ~ particle swarm optimization (PSO). 
The performance of each algorithm is analyzed with 
two reservoir models. Although GPS, PSO and GMA- 
ES are widely used in petroleum engineering and many 
other areas, to the best of our knowledge there has been 
no attempt to combine these methods with a multiscale 
approach to solve the well control optimization prob¬ 
lem. Furthermore, we give detailed discussions on the 
following topics: 


• The effects of control frequency on well control op¬ 
timization, including the effects on the production 
and the effects on the control strategy. 

• The performance of GPS, PSO, and CMA-ES with 
and without the multiscale framework for well con¬ 
trol optimization. 

• The performance of the approaches as a function of 
computational budget and the effect of a parallel 
environment. 

• The best parameter settings for GPS, PSO, and 
CMA-ES to maximize their performance within a 
multiscale framework. 

• The best configuration for the multiscale framework, 
including the parameter settings and the choice of 
hybridized algorithm. 

This paper is structured as follows. Section 2 de¬ 
scribes the well control problem formulation. Section 
3 gives brief description of derivative-free optimization 
methods used. Section 4 gives detailed description of 
the multiscale approach used and the framework of com¬ 
bining the multiscale approach with the three optimiza¬ 
tion algorithms. In Section 5, we detail the computa¬ 
tional methodology and describe the reservoir models 
used in this paper. Section 6, we present the results and 
discussion for the experiments. Finally, in Section 7, we 
provide a summary and conclusions of this work. 

2 The well control optimization problem 

In this section, we describe the well control optimization 
problem, including the objective function of interest, 
the control variables and the imposed constraints. 

The typical objective function associated with a well 
control problem evaluates an economic model and takes 
into account different costs such as the price of oil, the 
costs of the injection and the production of water. An¬ 
other alternative is to use the cumulative oil production 
or the barrel of oil equivalent |7] . In this work, the objec¬ 
tive function of interest is the net present value (NPV) 
of a time series of cash flows. For the two-phase flow of 
oil and water, the NPV is defined by 

r At / 

NPV (u) = ^ I Tgpq^piu) + ropq”p(u) 

u=i L(l + &) " ^ 

- c™pq(),p(u) - c^,q",(u)^ , 

where u is set of control variables during the reservoir 
productive lifetime; q^^, q”p and q”^, respectively, de¬ 
note the average gas rate, the average oil rate and the 
average water rate for the nth time step; q” j is the av¬ 
erage water-injection rate for the nth time step; Vgp and 
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Top are the gas and oil revenue; c^p is the disposal cost 
of produced water; is the water injection cost; Nt 
is total number of time steps; is the time at the end 
of nth time step; and is nth time step size. The 
quantity r provides the appropriate normalization for 
tn, e.g., T = 365 days. The quantity b is the fractional 
discount rate. 

The optimization variables u could contain the well 
bottom hole pressures or the well liquid rates. In this 
work, we control wells by specifying the liquid rates. 
The vector u is an W^-dimensional column vector, where 
Nu is the total number of well controls. Assuming each 
well has the same frequency of control steps, then Nu = 
Nt ■ Nuj, where fVu, is the total number of wells and Nt 
is the total number of time steps. 

Well control optimization during the reservoir life 
cycle can be expressed as the following mathematical 
problem: 


max NPV (u), 

(2) 

subject to u;f, < u < u„f,, 

(3) 

c(u) < 0, 

(4) 

e(u) = 0, 

(5) 


where NPV (u) is the objective function given by equa¬ 
tion 0 . And, in order, equations are the bound, 

inequality, and equality constraints (if any) imposed on 
the problem. The quantities u;;, and u„b are the lower 
and upper bounds for control variables, where the in¬ 
equality is understood to apply component-wise. There 
are several methods which could be used to handle con¬ 
straints in derivative-free optimization algorithms in 
the literature [7]. In general, infeasible individuals can 
be rejected, penalized, or repaired. Although all three 
algorithms used in this paper can be adapted to handle 
the three types of constraints, we restrict ourselves to 
bound constraints only. 


3 Overview of the derivative free optimization 
algorithms used 

In this section, we briefly describe the derivative-free 
optimization algorithms considered in this paper: GPS, 
PSO, and CMA-ES. These are typical derivative-free, 
black-box optimization algorithms. Each method has 
distinct characteristics and all have been applied suc¬ 
cessfully to solve reservoir development problems as 
mentioned in the introduction. 


3.1 Generalized Pattern Search 

The generalized pattern search (GPS) is an example of 
a direct search method. GPS is a deterministic local 
search algorithm. It does not directly use or require the 
gradient of the objective function to be specified (or 
even to exist). Hence GPS can be used on functions 
that lack smoothness, those that are not continuous or 
differentiable. GPS can be applied in situations when 
the objective function is rough and multi-modal and 
hence the gradients do not guide directions of global 
ascent GPS is guaranteed to converge to locally 

optimal solutions and can provide useful approximate 
solutions for some global problems p i40ll45] . 

A basic generalized pattern search proceeds as fol¬ 
lows. Choose an initial point and evaluate the objective 
function at that point. Then evaluate the function on a 
pattern specified by set of directions and a step or mesh 
size. After the search is complete along all the directions 
of the pattern, the point with the highest function value 
becomes the current point for the next iteration. The 
step size for the next iteration will be multiplied by a 
factor that is larger than 1 (an expansion factor) if one 
or more previous moves found a better point, or a factor 
between 0 and 1 (a contraction factor) if no better point 
is found. In one iteration, if all pattern directions are 
evaluated before choosing a new current point, we call 
it a complete poll. The algorithm can also provide an 
incomplete poll procedure instead of the complete one. 
For an incomplete poll, the algorithm stops searching 
along the directions as soon as it finds a point whose 
objective function value is more than that of the cur¬ 
rent point, and the point it finds becomes the current 
point at the next iteration. The search stops when, for 
example, a specified minimum step size may be reached. 

The choice of the GPS pattern can dramatically af¬ 
fect the performance of the algorithm. From the current 
point, the pattern determines the search directions for 
each iteration. The pattern is usually expressed as a set 
of vectors {v^ G K" : i = 1, ■■■ ,r} which form a posi¬ 
tive basis. Every vector in K” can be written as a linear 
combination aivi -I- • • • -|- a^Vj. with all coefficients at 
are zero or greater. No vector of the positive basis can 
be expressed by a positive linear combination of other 
members of the basis. Using a positive basis is benefi¬ 
cial in GPS because they give small numbers of search 
directions [1]. Two positive basis are commonly used as 
search directions in GPS, namely the maximal basis and 
the minimal basis. For an n-dimensional optimization 
problem, maximal and minimal basis sets have 2n and 
n-l- 1 vectors respectively [53]. Fig. shows an example 
of maximal positive basis (a) and a minimal positive 
basis (b) in 
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(b) Minimal positive basis 


Fig. 1 Maximal positive basis and minimal positive basis 
vectors in 

For bound constrained problems, GPS needs a fea¬ 
sible initial point and keeps feasibility of the iterates by 
rejecting any trial point that is out of the feasible re¬ 
gion. For an infeasible trial point, the objective function 
is not evaluated and set to infinity. 

As mentioned above, GPS is guaranteed to converge 
to locally optimal solutions. Evaluating the objective 
function at each point in a maximal basis and using 
complete poll will be quite expensive, however, consid¬ 
ering the complexity of well control problems we in¬ 
deed choose the maximal basis and complete poll for 
our tests. The expansion factor is set to 2, and the con¬ 
traction factor is set to 0.5. 


3.2 Particle Swarm Optimization 

Particle swarm optimization (PSO) is a population- 
based stochastic search method. The PSO search mech¬ 
anism mimics the social behaviour of biological organ¬ 
isms such as a flock of birds |28]. PSO can search very 
large space of candidate solutions, and the stochastic 
element of the movement of the population reduces the 
chance of getting trapped at an unsatisfactory local op¬ 
timum, however, PSO does not guarantee an optimal 
solution is ever found. In spite of this, PSO has been 


successfully applied for both well placement and pro¬ 
duction optimization |35[I341[26] . 

PSO initially chooses a population of candidate so¬ 
lutions (called a swarm of particles). These particles 
move through the search space in search of function im¬ 
provement according to a random rule which updates 
each particle’s position. Each particle’s position xf’ and 
velocity changes during each iteration. For objec¬ 
tive function / : M" —>• M", both xf and vf are n- 
dimensional vectors. The objective function value for 
each particle, is evaluated simply using the posi¬ 

tion of the particle. Each particle’s movement is guided 
by the best position it has found so far, p^, and the 
best known position of all particles or particles in some 
neighborhood, . 

Eollowing initialization, the PSO algorithm [55] up¬ 
dates each particle’s position and velocity as: 

xf+i = xf-b (6) 

and 

+ cir^® - xf) (gf - xf) . (7) 

Equation include three parts: vf represents the 
tendency to continue moving along the particle’s cur¬ 
rent direction and velocity, (pf — xf’) represents the 
tendency to move to the best position found by the 
particle itself so far, and (gj’ — xf) represents the ten¬ 
dency to move to the best position found by all parti¬ 
cles in its neighborhood. The quantities w, ci and C 2 are 
weighting parameters and r* and y\ are stochastic n- 
dimensional vectors which are generated from the uni¬ 
form distribution on (0,1) during each iteration. The 
operator 0 indicates component-wise multiplication. 
The random element helps ensure that PSO avoids pre¬ 
mature convergence to a local minimum by facilitating 
sufficient global exploration of the search space Eami. 

Here we use an absorbing strategies to handle bound 
constraints for PSO. The invalid particles are set to 
the nearest boundary position. The respective velocity 
components are set to zero mmn. 

Unlike local search methods, such as GPS, the PSO 
algorithm may avoid local optima with its stochastic 
and global search capability. PSO has been used in 
petroleum and other fields with excellent effect. How¬ 
ever, no number of function evaluations can guarantee 
convergence to the global optima. The selection of the 
algorithmic parameters has a considerable affect on the 
performance of the algorithm muMj. For the parame¬ 
ters w, Cl and C 2 , Perez et al. [36| demonstrated that 
the particle swarm is only stable if the following two 
conditions are satisfied: 

0 < (ci -b C 2 ) < 4, 


( 8 ) 
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and 

i(ci + C 2 ) - 1 < w < 1. (9) 

Following the above principles, our implementation 
of PSO uses weighting parameters of w = 0.9, ci = 0.5, 
and C 2 = 1.25. We use a global best neighbourhood 
topology, meaning that every particle communicates 
with every other particle in the swarm, and thus gf can 
be replaced by a single vector g^, representing the best 
solution found so far. Evaluating the objective function 
for all members of the swarm is an embarrassingly par¬ 
allel operation. 


3.3 Covariance Matrix Adaptation Evolution Strategy 


Typically fi is chosen as /r = [A/2J, where [ J is the floor 
function, and uji are strictly positive and normalized 
weights. 

The covariance matrix is then updated as 


= (1 - Ccov) C'= + — 

A^COV 

“1“ Ccov ( 1 

l-^CO' 


(13) 




where quantity pjl is called the evolution path. It gives 
a direction where we expect to see good solutions. The 
evolution path is given iteratively as 


Covariance Matrix Adaptation Evolution Strategy (CMA- 
ES) is a population-based stochastic optimization algo¬ 
rithm. Unlike GA, PSO, and other classical population- 
based stochastic search algorithms, candidate solutions 
of CMA-ES are sampled from a probability distribution 
which is updated iteratively. This algorithm performs 
better on the benchmark multimodal functions than all 
other similar classes of learning algorithms [44] . CMA- 
ES also showed its potential in well placement and con¬ 
trol optimizations wm- 

CMA-ES samples a population of A candidate solu¬ 
tions at iteration k according to: 

xf = Af (m^ (a'=)2c'=) , for i = 1, • • • , A, (10) 


where Af {■ ■ ■ , • • •) is a random vector from a multivari¬ 
ate normal distribution. 

The mean vector represents the favorite solu¬ 
tion or best estimate of the optimum, and the covari¬ 
ance matrix is a symmetric positive definite matrix 
which characterizes the geometric shape of the distribu¬ 
tion and defines where new candidate solutions are sam¬ 
pled. The step-size a is used as a global scaling factor 
for the covariance matrix. It aims at achieving fast con¬ 
vergence and preventing premature convergence. These 
three parameters are updated as the iteration proceeds. 

The A individuals generated by equation (10) are 
evaluated and ranked by objective function value. The 
mean is then updated by taking the weighted mean 
of the best fi individuals: 


m 


k + l _ 






( 11 ) 


where is the best individual. 

The default weights |3] are chosen as 


In (/r -I- 1) — In {i) 
/rln (/i -I- 1) - In (/r!) ’ 


for i = 1, • • • , /r. 


( 12 ) 


= (1 - Cc) Pc + \/Cc(2-Cc) 


Meff 


(14) 


where Cc is a constant in (0,1]. The quantity /XeS = 
1/ X)i(=i denotes the variance effective selection mass. 
It is a measure characterizing the recombination. From 
equation (141 we can see that the new search direction 
is based on the old direction and the descent 

fe +1 fc 

direction — — ,7™ . 

cr^ 

The adaptation of the global step size is given 

by 


exp 




E|lAf(0,I)|| 


- 1 


(15) 


which depends on the conjugate evolution path 
given by 


p^i=(l-c.)p^ 

-I- \/co- (2 - Ca) fiescr'' ^^ . 

(16) 

In combination with covariance matrix adaptation, 
step-size adaptation enables linear convergence on a 
wide range of, even ill-conditioned, functions |7]. 

Fig.i is an illustration of an optimization run with 
CMA-ES on a two-dimensional linear function /(x) = 
xl + X 2 with bound constraints xi,X 2 S [—800,800]. 
The dashed lines are the contour lines of the function. 
The initial point is [—200, —200]. The optimal solution 
is [0, 0] and is marked by blue symbol ‘x’. The orange 
and gray dots denote the population distribution for 
the current and the last iteration, respectively. The red 
cross (‘-b’) and the red ellipsoid denote the symmetry 
center m and the isodensity line of the distribution for 
the current iteration, while the black cross and the black 
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ellipsoid give these quantities for the last iteration. The 
isodensity line defined as (x — m)^C”^(x — m) = c 
where the constant c = 3 [7]. The population is much 
larger than necessary, but the figure clearly shows how 
the distribution of the population changes during the 
optimization. 

For bound constrained problems, CMA-ES can sim¬ 
ply ‘repair’ an infeasible individual to its nearest feasi¬ 
ble solution by using a repair algorithm before the up¬ 
date equations are applied m- This strategy is not rec¬ 
ommended because CMA-ES makes implicit assump¬ 
tions on the distribution of individuals. The distribu¬ 
tion can be violated by a repair. Here we choose the 
penalization strategy from mm- The infeasible points 
are repaired by using the repair algorithm and evalu¬ 
ated, and a penalty is added to the function value for 
every repaired point. The penalty is dependent on the 
distance to the repaired solution. 

Our implementation of CMA-ES uses the parame¬ 
ters given in the work of Hansen et al. E!. The param¬ 
eter values are given in Table The initial values used 
are p° = p° = 0 and C° = I, while x^^^ and cr° are 
user supplied. 

Table 1 Strategy parameter values used in CMA-ES from 

El- 

Parameter Value 

A 4 -h [3 ln(r!,)J 

M LA/2J 

c 4 _ 

n + 4 

^ Meff-t-2 

^ ~t~3 

da 1-1-2 max ^0, \J + Ca 

A^cov /^eff 

Ccov — . + fl-—) min fl, , ) 

Mcov (n, + v 2 )^ \ Mcov / V ^ + + J 


4 A Multiscale framework 

In production optimization, specifying the frequency of 
needed well control adjustment is a challenge. On one 
hand, a high frequency adjustment of control parame¬ 
ters imposes unrealistic burden on operations, leading 
to an increase in well management costs. In addition, 
from an optimization perspective a high frequency of 
control adjustments implies an explosion in the num¬ 
ber of control variables, requiring a great amount of 
computation and time to get an optimal solution. This 
may be especially true for derivative-free algorithms, 
which may need many more function evaluations than 
gradient-based algorithms. Many degrees of freedom 


also increase the risk of an optimization algorithm be¬ 
ing trapped in a local optimum. On the other hand, 
imposing too few control adjustments may not truly 
optimize oil recovery. 

Multiscale regularization provides a way to address 
the complexity of the optimization problem with a large 
number of control adjustments. The multiscale approach 
starts with a coarse number of control steps and suc¬ 
cessively increases the frequency of control adjustments 
using the coarse-scale solution as the initial guess for 
the next finer scale optimization [30113211^ . The refine¬ 
ment process is terminated when a specified stopping 
criteria is satisfied. For example, a maximum number 
of control adjustments or a minimum allowable change 
in the objective function could be imposed. 

To the best of our knowledge, three related mul¬ 
tiscale approaches have been investigated for the well 
control optimization problem. The first approach, first 
seen in |30) . is referred to as ordinary multiscale or 
successive-splitting multiscale [38] . The optimization 
starts with a coarse number of control adjustments and 
subsequently splits each control step into two new ones 
at every iteration. The second optimization strategy, 
also proposed by Lien et. al. [30], uses the magnitude 
of the components of the gradient of the objective to de¬ 
termine refinement indicators. The algorithm progres¬ 
sively increases the number of variables using the re¬ 
finement indicators to choose the most-efficient parti¬ 
tioning of the current control steps to increase the value 
of the objective function. The third approach is called 
the hierarchical multiscale method mm- It is simi¬ 
lar to the ordinary multiscale approach in [501135] . but 
the algorithm can also merge existing control steps by 
considering the difference between well controls at two 
consecutive control steps and the gradient of the objec¬ 
tive function with respect to the well controls. 

The goal of the present work is to explore the fea¬ 
sibility of improving the performance of derivative-free 
algorithms in solving large scale well control optimiza¬ 
tion problems by using a multiscale approach. We choose 
the successive-splitting multiscale approach because the 
other two methods, the refinement indicator multiscale 
approach and the hierarchical multiscale approach, re¬ 
quire gradient information of the objective function - 
information we do not assume is available. Furthermore, 
one recent study compared the sophisticated refinement 
indicator and hierarchical multiscale approaches with 
the simpler successive-splitting approach and showed 
similar performance |32j . 

Before we introduce our modified approach, we take 
a look at the original successive-splitting multiscale ap¬ 
proach. As Shuai et al. describe in [35], the successive- 
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(a) Iteration 1 (b) Iteration 3 (c) Iteration 5 

Fig. 2 An illustration of CMA-ES optimization on a two-dimensional linear function /(x) = x'f x"^. The dashed lines are 
the contour lines of the function. The optimal solution is in the upper right corner. The orange and gray dots denote the 
population distribution for the current and the last iteration, respectively. The black cross and the black ellipsoid denote 

the symmetry center and the isodensity line of the distribution for the current iteration, while the gray cross and the gray 
ellipsoid are for the last iteration. 


splitting multiscale algorithm generally loops over the 
following steps: 

1) INITIALIZATION One control step for each well 
(initial steps Uq = I); The number of unknowns 
is equal to the number of wells; Initial guesses of 
control are assigned to each well. 

2) OPTIMIZATION Solving the well control optimiza¬ 
tion problem using an optimization algorithm. 

3) SPLITTING Split each control step into two steps 
of equal length (split factor Ug = 2); This doubles 
the number of control variables; Use the solution 
from step 2) as the initial well control; Go to step 
2 ). 

Our experience indicates the efficacy of a multiscale 
approach depends on two key parameters: the num¬ 
ber of control steps for each well at the beginning of 
the optimization (i.e. the number of initial steps no) 
and the multiplicative increase in the number of con¬ 
trol steps at every iteration (i.e. the split factor Us). 
As mentioned, the successive-splitting multiscale ap¬ 
proach used in [38] starts the optimization procedure by 
finding the optimal control strategy assuming one con¬ 
trol step (no = !)• Subsequent optimizations split the 
number of control steps by a fixed split factor Ug = 2. 
We show that this configuration of the two parameters 
is not always the most efficient configuration. On one 
hand, the optimal well control strategies with a very 
coarse parametrization may be dramatically different 
than with a fine parametrization (or large number of 
control adjustments). Hence the solution found by a 
very coarse parametrization is not useful as an initial 
guess to find the optimal fine parametrization or will re¬ 
quire many successive splittings. This observation has 
to be balanced with the realization and motivation that 
the problem with a large number of control adjustments 
is too difficult solve immediately. The split factor is the 


key to balance the difficulty of optimization problem 
at each scale and the total number of scales. With a 
higher split factor, less scales are needed to reach the 
maximum number of control steps. We will show this is 
more efficient in some cases. 

Based on the above, in addition to coupling the mul¬ 
tiscale approach with commonly used derivative free al¬ 
gorithms, we consider the effect of the choice of the ini¬ 
tial number of control number steps ng and the choice 
of Ug in the overall efficiency of the multiscale opti¬ 
mization process. We show this added flexibility in our 
algorithm is useful in some situations. In our modified 
multiscale approach, we left the choice of the initial 
number of steps and the choice of the split factor to the 
user. We start the multiscale algorithm with a reason¬ 
ably small value of ng - the initial number of control 
steps, and then find the associated optimal controls. 
After maximizing objective function on the basis of the 
initial control steps, we split each control step into sev¬ 
eral steps depending on the split factor Ug as 

x,;+i(n) = Xj^d'n/ris]), n = 1,2, • • • ^N^xug, (17) 

where Xi_(_i(n) is the nth variable in the initial guess 
for the (i-l-l)th scale; Xi^d’n/n^]) is the [n/ngjth vari¬ 
able in the optimum solution for the Ah scale, |" ] is the 
ceiling function; is the total number of variables for 
the Ah scale. With this formula, the total number of 
variables for the (z -I- l)th scale becomes x rig, and 
every Ug variables for the {i -I- l)th scale use the opti¬ 
mum solution of the Ah scale. This process of splitting 
the control steps and performing a new optimization is 
continued until the maximum number of control steps 
is reached. 

Fig-i gives an illustration of how the successive- 
splitting multiscale approach splits the control steps to 
give the next finer scale. In this figure, we show the 
resulting number of control steps for two choices of Ug 
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(rtg = 2 and rig = 4) assuming the number of initial 
steps is no = 2. 

Control step=2 


I_I_I 


ns=2 

03=4 

1 1 1 1 

1 1 1 1 1 1 1 1 1 

Control step=4 

Control step=8 


Fig. 3 Control steps split by the successive-splitting multi¬ 
scale approach. 

Our well control optimization procedure using a derivative- 
free multiscale approach is described by the following 
steps (Algorithm . A flow chart of the algorithm is 
given in Fig. 


Algorithm 1 The multiscale approach with derivative- 
free algorithms 

select solver: GPS, PSO or CMA-ES 

set initial control steps for each well no, and the split factor 
ris 

set initial guess xo 
iteration i ■<— 0 

while not (global stopping criteria reached) do 
while not (scale stopping criteria) do 
solve X* = argmax NPV (x) 
end while 
let Xit = X* 

split, set control steps for each well ni_|_i Ui x 
update the initial guess, Xi+i(n) = Xi»([n/ns]), n = 
1, 2, • • ■ , Ntjj X TLs 
test the scale stopping criteria 

end while 


At early scales of the algorithm, optimizations are 
performed with less stringent convergence tolerances to 
find an approximate solution. The tolerances are made 
smaller as the algorithm proceeds, using the smallest 
tolerance at the last scale. With this approach, we re¬ 
duce the computational cost at early scales. The toler¬ 
ance settings for the experiments can be found in Sec¬ 
tion |6l 


5 Example cases 

In this section, we list all approaches considered, and 
give a detailed description of the reservoir models used 
in this paper. 



Fig. 4 Flow chart of well control optimization with multi¬ 
scale approach and derivative-free algorithms. 

5.1 Optimization Approaches 

Approaches considered in this paper include the three 
original optimization algorithms, GPS, PSO, and CMA- 
ES as described in Sectionj^and three hybrid approaches 
that combine the original algorithms with our modified 
multiscale method described in Section [i] The hybrid 
multiscale approaches are labeled as M-GPS, M-PSO, 
and M-CMA-ES. 

To investigate the effect of tiq and Ug, we test four 
different configurations for each hybrid approach. We 
use the Roman numerals I, II, III, and IV to represent 
the four configurations. The configurations used are: 

• Configuration I, —the initial number of control steps 
for each well is no = 1 and the split factor is ns = 2. 
With this configuration, the multiscale method is 
the same as the successive-splitting multiscale method 
from [38] . 

• Configuration II, —the initial number of control steps 
for each well is no = 2 and the split factor is ns = 2. 

• Configuration III, —the initial number of control 
steps for each well is no = 2 and the split factor is 
Us = 4. 
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• Configuration IV, —the initial number of control 
steps for each well is no = 1 and the split factor is 
ris = 4. 

Fig. [5] provides an overview of all approaches con¬ 
sidered in our experiments. The approaches fall into 
different quadrants according to their search features. 



Fig. 5 An overview of all approaches we considered in our 
experiments. The approaches fall into different quadrants ac¬ 
cording to their search features. 


5.2 Model description 

Two reservoir models are considered in this paper. The 
first one is a simple 2-D reservoir model. This model 
is used to analyze the performance of the approaches 
mentioned in Section 15.11 The second model is a real- 
world reservoir model, and we apply the multiscale ap¬ 
proaches to this model to optimize the control strategy. 

5.2.1 Model 1: 5-spot model 

The first model is a single-layer reservoir containing 
four producing wells and one injection well in a five-spot 
well pattern [32] . The reservoir model is represented by 
a 51 X 51 uniform grid (Ax = At/ = 10m; Az = 5m). 
We consider only oil-water two phase flow. The reser¬ 
voir permeability field and well placements are shown 
in Fig.[^ We note that there are four different regions of 
homogeneous permeability. The permeabilities are 1000 
mD for the two high-permeability regions, and 100 mD 
for the two low-permeability regions. The porosity, net- 
to-gross ratio, and initial water saturation are all 0.2 at 
all grid blocks. 

The reservoir lifetime is set to 720 days. The injec¬ 
tion well INJ-01 (in Fig. is not controlled, the liq¬ 
uid rate is fixed at 240 mr/d. The liquid rates of four 



Fig. 6 The permeanbility field (mD) for model 1. 

producing wells are the optimization variables. Bound 
constraints are considered for the producing wells. The 
lower bound is set to 0 m^/d and the upper bound is 
80 rn^/d for PRO-01 & PRO-03 while 0 m^/d and 40 
m^/d are the lower and upper bounds for PRO-02 & 
PRO-04. The initial rates of all producing wells are 20 
w? jd. 

The objective function we use for this model is the 
NPV (see equation 0) and the corresponding economic 
parameters are given in Table 

Table 2 Economic parameters used for model 1. 


Parameter 

Value 

Oil revenue 

USD 500.0/m^ 

Water-production cost 

USD 250.0/m3 

Water-injection cost 

USD 80.0/m3 

Annual discount rate 

0 


We use Eclipse 100 m, a commercial reservoir sim¬ 
ulation software from Schlumberger Ltd., to calculate 
the relevant time-dependent production information for 
every well for all experiments in this paper. 

5.2.2 Model 2: PUNQ-S3 

The second reservoir model is the PUNQ-S3, which is 
a small-size reservoir model based on the North Sea 
reservoir [14]. The model contains a three phase gas- 
oil-water system with 19 x 28 x 5 grid blocks, of which 
1761 blocks are active. The field contains 6 production 
wells but no injection wells are present due to the strong 
aquifer. Fig. [^shows the tops (depth of the top phase), 
permeability and oil saturation present in the model. 
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Fig. 7 Property and wells of PUNQ-S3 field. 


n 



0.00000 0.20000 O.40OOO o.oOooo o.sflooo 

(c) Oil saturation 


We use a production period of 3840 days (about 10 
years), with a minimum control interval of 120 days. 
The initial liquid rates for all wells are 100 m^/d. The 
lower bound is set to 0 m^/d and the upper bound is 
200 m?/d for all wells. BHP bounds are also considered 
in this example. The lower BHP bounds are set to 12 
MPa and no upper bound is enforced for any producers. 
The economic parameters for the NPV calculation are 
given in Table 

Table 3 Economic parameters used for PUNQ-S3. 


Parameter 

Value 

Gas revenue 

USD 0.5/m^ 

Oil revenue 

USD 500.0/m^ 

Water-production cost 

USD 80.0/m3 

Annual discount rate 

0 


6 Results and discussion 

6.1 Effects of control frequency on well control 
optimization 

To show the effect of the control frequency on the NPV 
for the first model, as described in Section [5.2.1[ we turn 
off the multiscale approach and optimize using four dif¬ 
ferent, fixed control frequencies. Four control frequen¬ 
cies are considered and these constitute four variations 
of the optimization problem: 


• Case lA, each well is produced under a liquid rate 
throughout its lifetime. This gives 4 optimization 
variables in total. 

• Case IB, the liquid rate for each well updated every 
360 days (2 control periods). This gives 8 optimiza¬ 
tion variables in total. 

• Case 1C, the liquid rate for each well updated every 
90 days (8 control periods). This gives 32 optimiza¬ 
tion variables in total. 

• Case ID, the liquid rate for each well updated ev¬ 
ery 22.5 days (32 control periods). This gives 128 
optimization variables in total. 

Three optimization algorithms, GPS, PSO, and CMA- 
ES, are applied to each case to find the optimal controls 
and the corresponding NPV. 

6.1.1 Optimal NPV under different control frequencies 

Fig.ll compares the optimal NPV under different con¬ 
trol frequencies for this model. The results shown are 
the best values found using all the optimization ap¬ 
proaches. Well control with a reasonable frequency is 
necessary — we obtain a significantly higher NPV than 
what is possible when using a fixed rate over the life 
cycle (Case lA). It is clear that with the increase of 
the number of control adjustments, the optimal NPV 
grows more and more slowly. The increase in maximum 
NPV found is very slight (0.28 %) when the number 
of control steps for each well increases from 8 (Case 
1C) to 32 (Case ID). There is no need to adjust well 
rates too frequently. We will not see a considerable rev¬ 
enue increase and the increase in the number of con¬ 
trol adjustments will increase operation costs. Also the 
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problem with a large number of control adjustments 
is harder to optimize and have a higher risk of falling 
into a local optima (see Section 6.2.2). This justifies the 
use of multiscale approach to determine an appropriate 
control frequency. 


same test cases as in Section [6T] We use the maximum 
number of simulation runs as the stopping criterion, and 
this value is set to 100 times the number of control vari¬ 
ables. As PSO and CMA-ES are stochastic algorithms, 
10 trials are performed for these two algorithms to as¬ 
sess the average performance. 



Fig. 8 Optimum NPV for cases with different control fre¬ 
quencies. 


6.1.2 Optimal controls under different control 
frequencies 


Fig. [^presents optimum controls for wells PRO-01 and 
PRO-02 under different control frequencies. We omit 
the results for well PRO-03 and PRO-04 because the 
reservoir is symmetric. The optimum controls become 
more like a bang-bang solution for all wells with an in¬ 
crease in the number of control steps. It is worth noting 
that the optimum controls for Case lA are significantly 
different that those for Cases IB-ID. This reflects the 
different production strategies for wells using a static 
rate compared to using dynamic well controls in water 
flooding reservoirs. The similarity of optimum controls 
between different control frequencies is important for 
the success of a multiscale framework. As the multiscale 
approach uses the optimal controls found in iteration i 
as the initial guess for iteration z -|- 1, a good initial 
guess could accelerate optimization process and a bad 
initial guess may mislead the optimizer to wrong search 
areas and directions (see Section 6.2.1). Indeed Fig. 
shows this required similarity as the number of control 
steps is increased. 


6.2 Performance of GPS, PSO, and CMA-FS for well 
control optimization 

In this section we address the performance of GPS, 
PSO, and CMA-FS for well control optimization with¬ 
out the use of the multiscale framework. We use the 


6.2.1 Parameter tuning and the effect of the initial 
guess on GPS, PSO, and CMA-ES 

The performance of the optimization algorithms are af¬ 
fected by the choice of their parameter values. In this 
section, we complete parameter tunings for GPS, PSO, 
and CMA-FS to improve their performance in solving 
well control optimization problems. Here we perform 
a tuning study for two choices of initial guesses. The 
good initial guess is chosen to mimic the initial guess 
provided by the multiscale algorithm. The bad initial 
guess is purposely chosen to be far away from the opti¬ 
mal controls. 

We hypothesize that the performance of the local 
search algorithms are highly affected by the initial guess, 
while the stochastic global search algorithms are not. 
We take Case IB as an example and use the three dif¬ 
ferent initial guesses shown in Table For each initial 
guess, 10 trials were performed for PSO and CMA-ES 
and 1 trial for GPS (since it is a deterministic algo¬ 
rithm) . 


Table 4 Three initial guesses for GPS, PSO, and CMA-ES. 


Type 

Initial guess 

NPV, xl0“ USD 

good 

[20,20,20,20,20,20,20,20] 

5.0009 

medium 

[20,20,40,40,40,40,20,20] 

2.6484 

bad 

[0,40,0,80,0,80,0,40] 

-4.2826 


Fig. shows the plots of NPV versus the number 
of simulations for GPS, PSO, and CMA-ES. Each line 
represents one trial. The early convergence of GPS and 
PSO is affected by the choice of the initial guess. GMA- 
ES recovers quite quickly from the bad initial guess, 
even converges more quickly than from a good initial 
guess in some cases. With a large number of simulation 
runs, the effect of initial guess for all three algorithms 
is quite small. This suggests that if we want to make 
efficient use of the multiscale approach, the number of 
simulations at each scale should be limited. 

Again using a good and a bad initial guess, we an¬ 
alyzed the effect of the other parameters for PSO and 
CMA-ES, to find out the primary parameters which af¬ 
fect the performance of the algorithms. For PSO, the 
algorithm parameters include the population size A, and 
the parameters uj, C\, and ci- Three levels are cho¬ 
sen for each parameter. For CMA-ES, the algorithm 
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Fig. 9 Optimum well control for cases with different control frequencies. 
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parameters include the population size A, the parent 
number /t (number of candidate solutions used to up¬ 
date the distribution parameters), the recombination 
weights w, and the parameter cr, which determines the 
initial coordinate-wise standard deviations for the search. 


For each parameter choice, 10 trials are performed 
for Case IB with a good and a bad initial guess to start 
the optimization. We use the best NPV obtained af¬ 
ter 20% of maximum simulation runs as the evaluation 
criterion of the algorithm’s performance. We mainly fo¬ 
cus on the performance of the algorithms at an early 
stage because the multiscale framework requires a good 
early stage performance for the hybrid optimization al¬ 
gorithm. Also our test results showed that the perfor¬ 
mance of the algorithms are less sensitive to the param¬ 
eter values at a later stage. The performance for each 
parameter choice is shown in the beanplots in Fig. im 
and Fig. 12 A beanplot [57] promotes visual compar¬ 
ison of univariate data between groups. In a beanplot, 
the individual observations are shown as small points 
or small lines in a one-dimensional scatter plot. In addi¬ 
tion, the estimated density of the distributions is visible 
and the mean (bold line) and median (marker ‘-I-’) are 
shown. 


From Fig. [D we can see that for PSO the popu¬ 
lation size plays the most important role in the algo¬ 
rithm’s ability to utilize the good initial guess. When 
the population size equals 20 or 50, PSO with a bad 
initial guess obtains a similar NPV as PSO with a good 
initial guess. This is because with the same number 
of simulation runs, PSO with a small population size 


can evolve more generations, and this decreases the af¬ 
fect of the initial guess. The bigger population size, the 
smaller the variability in the NPV results with a similar 
mean value. For these reasons, we choose A = 100 for all 
subsequent PSO experiment. Parameter is one of the 
weighting parameters, the bigger C 2 , the greater the ten¬ 
dency for the particles to fly towards the best location 
found so far. We suggest a bigger C 2 when combining 
with the multiscale approach. The parameters w and 
Cl have no obvious affect in this case. Generally for all 
parameter values PSO responds favorably to the better 
initial guess. 

From Fig. we can see that for CMA-ES the good 
initial guess gives a higher average NPV with smaller 
variability. For this problem, the best conhguration is 
cr = 0.3, A = 10, p. = 2, and w = superlinear; this 
is also the default configuration of CMA-ES. In fact, 
according to the work of [501, since finding good pa¬ 
rameters is considered as part of the algorithm design, 
CMA-ES does not require significant parameter tuning 
for its application. The choice of parameters is not left 
to the user (arguably with the exception of population 
size a). 


6.2.2 Performance with different control frequencies 

Tablej^and Fig.[T3|show the performance of GPS, PSO, 
and CMA-ES for well control optimization problems 
with different control frequencies. In Tablethe maxi¬ 
mum, minimum, mean, median, and standard deviation 
of the NPV for each case are given. From the table we 
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(b) PSO, cj 




Cl 

(c) PSO, Cl 


C2 

(d) PSO, C2 


Fig. 11 Beanplots of the NPV for various parameter settings of PSO. The left side of each beanplot gives the results obtained 
with a good initial guess, and the right side gives the results obtained with a bad initial guess. The individual dots show the 
NPV obtained by each trial. The background pink and green colors show the distribution of results. The short horizontal line 
and the marker ‘+’ denote the mean and median of all 10 trials, respectively. 


can see that, for Case lA, which has only 4 variables, 
GPS obtains the highest NPV after 400 simulation runs. 
Similar results are found in Case IB. In Case 1C, the 
maximum NPV of CMA-ES exceeds the GPS, but the 
mean and median NPV for CMA-ES are lower than 
these of GPS. In these three cases, although the final 
NPV of GPS is larger than the final NPV for CMA-ES 
and PSO, the difference of the mean/median NPV for 
the three algorithms is quite small (less than 2%). In 
Case ID, which has 128 variables, the NPV obtained 
by GPS is obviously lower than CMA-ES. Generally, 
CMA-ES showed excellent performance in most cases. 
GPS performs best when the problem dimension is very 
small. 


Pig. [T3| shows the plots of NPV versus the number 
of simulation runs for the four cases. In this figure, we 
use a solid line to show the median NPV of each algo¬ 
rithm, and use the same color as the line to fill the area 
between the maximum and minimum NPV for each al¬ 
gorithm. These plots clearly show the performance of 
GPS, PSO, and CMA-ES using different computational 
budgets. GPS obtains the highest NPV for Case lA- 


IC at the end of optimization. But the budget (number 
of simulation runs) required for GPS grows rapidly as 
the dimension of the problem increases. GPS converged 
with no more than 50% of total budget for Cases lA and 
IB, and about 80% of the total budget for Case 1C. Eor 
Case ID, GPS did not converge after lOOD simulation 
runs. CMA-ES obtains almost as high a NPV as GPS 
for Gase lA-lC, and it obtains highest NPV for Gase 
ID. Furthermore, CMA-ES showed an excellent perfor¬ 
mance when the budget is limited. PSO outperforms 
GPS for a low budget and a large problem dimension, 
but it still can not beat CMA-ES in these cases. 

Since PSO and CMA-ES are stochastic algorithms, 
the performance is different for each trial. In Table 
we can see the standard deviation for PSO is larger 
than the standard deviation for CMA-ES. In Figure [T3| 
we can see that the best NPV obtained has a higher 
variation for low computational budgets than for high 
budgets. For PSO, the variability did not decrease in 
Case 1C and ID as the algorithm converged. 

To investigate further, we choose 2 of the 32 vari¬ 
ables and 5 of the 10 trials for Case 1C and then com- 
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(a) CMA-ES, a (b) CMA-ES, A 




(c) CMA-ES, (d) CMA-ES, uj 

Fig. 12 Beanplots of the NPV for various parameter settings of CMA-ES. The left side of each beanplot gives the results 
obtained with a good initial guess, and the right side gives the results obtained with a bad initial guess. The individual dots 
show the NPV obtained by each trial. The background pink and green colors show the distribution of results. The short 
horizontal line and the marker denote the mean and median of all 10 trials, respectively. 

Table 5 Results for Cases lA-lD for Model 1 using GPS, PSO, and CMA-ES. Values shown are NPV (xlO® USD). 


Case 

Algorithm 

Trials 

Max 

Min 

Mean 

Median 

Std. 

lA 

GPS 

1 

5.3132 

5.3132 

5.3132 

5.3132 

- 


PSO 

10 

5.2850 

5.1850 

5.2603 

5.2720 

0.0310 


CMA-ES 

10 

5.3121 

5.2969 

5.3034 

5.3045 

0.0048 

IB 

GPS 

1 

10.3539 

10.3539 

10.3539 

10.3539 

- 


PSO 

10 

10.3200 

9.4220 

10.0840 

10.1700 

0.2819 


CMA-ES 

10 

10.3536 

10.3511 

10.3527 

10.3528 

0.0008 

1C 

GPS 

1 

12.3470 

12.3470 

12.3470 

12.3470 

- 


PSO 

10 

12.2700 

11.3800 

11.9660 

12.1050 

0.2966 


CMA-ES 

10 

12.3474 

12.3447 

12.3466 

12.3467 

0.0007 

ID 

GPS 

1 

9.5083 

9.5083 

9.5083 

9.5083 

- 


PSO 

10 

11.7000 

10.2300 

11.1290 

11.1700 

0.4910 


CMA-ES 

10 

12.4285 

12.3466 

12.4054 

12.4178 

0.0315 


pare the population distribution of CMA-ES and PSO 
at different iterations. The resulting scatter diagrams 
are shown in Fig. In Case 1C, the population size 
is 100 for PSO, and 14 for CMA-ES. Hence after same 
number of simulation runs, CMA-ES and PSO are at 
different iteration numbers. After 500 simulation runs, 
CMA-ES is at the 36th iteration, while PSO is at the 
5th iteration. After about 3000 simulation runs (Fig. 
|14(d)| and |14(h)[ ), we can see that PSO has converged to 


different locations for each trial. Compared with CMA- 
ES, PSO is more easily falls into local optima in our 
test cases, in spite of the larger population size and the 
ability search the entire space. 

6.2.3 Performance in parallel environments 

GPS, PSO, and CMA-ES parallelize naturally. Here we 
investigate the performance of these three algorithms 
in parallel environments. Table gives the population 
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Fig. 13 Optimization performance for well control problems using GPS, PSO and CMA-ES. The solid lines are median NPV 
over all 10 runs of PSO and CMA-ES without the multiscale framework. And the areas between maximum and minimum NPV 
are filled with the corresponding color. Note that the x-axis scale is different for each case. 


sizes of GPS, PSO, and CMA-ES in Cases lA-lD. The 
population size for GPS and CMA-ES are decided au¬ 
tomatically by the algorithms based on the problem di¬ 
mension. For a £)-dimensional problem, the population 
size equals 2D for GPS, and 4-|- [3In {D)\ for CMA-ES. 
The population size for PSO is usually decided by the 
user and we use 100 for all cases (more discussion on 
the population size is given in Section 6.2.11. 

In a parallel environment, we can evaluate a number 
of individuals, up to the number of processors, simul¬ 
taneously. Note that we are not able to evaluate the 
individuals from different iterations at the same time. 
For well control optimization problems, the computa¬ 
tion time is mainly spent evaluating the reservoir sim¬ 
ulation, a parallel environment can greatly reduce the 
time of optimization. 

Assume we have three parallel environments, with 
8, 32, and an infinite number of processors, respec¬ 
tively. Fig. compares the parallel performance of 
GPS, PSO, and CMA-ES in the parallel environments 
to the performance in a sequential environment. We 
use the number of runs as the y-axis in this figure. One 
run evaluates a number of potential solutions up to the 


Table 6 Population size of GPS, PSO, and CMA-ES for 
Case lA-lD. D is the number of variables in the problem. 


Case 

D 

GPS 

PSO 

CMA-ES 

lA 

4 

8 

100 

8 

IB 

8 

16 

100 

10 

1C 

32 

64 

100 

14 

ID 

128 

256 

100 

18 


number of processors. In an iteration, if the number of 
potential solutions is less than the number of proces¬ 
sors, then all the potential solutions are evaluated in a 
single run, with some processors idle. The number of 
runs is equal to the number of simulations if we have 
only one processor. 

In Fig. |15[ we compare the number of runs needed 
to get from the initial NPV to 50% of the final NPV, 
as well as the number of runs needed to reach the max¬ 
imum number of simulator evaluations (100 times the 
problem dimension). From this figure we can see that, 
with the increase of processors, the number of runs 
required for GPS, PSO, and CMA-ES decrease, until 
the number of processors is larger than the population 
size. For an algorithm, the larger the population size. 
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Fig. 14 Scatter diagrams for CMA-ES and PSO for Case 1C. The bracketed number in the caption of each sub-figure is the 
iteration number. Points represent the candidate solutions at this specific iteration. 


the greater the benefits from the parallel environment. 
With an increase in the number of processors, the or¬ 
der of three algorithms changes. For Case lA, GPS per¬ 
forms best followed by CMA-ES and PSO in the sequen¬ 
tial environment (number of processors equals 1). The 
order becomes GPS>PSO>CMA-ES when the number 
of processors reaches 32. Eurthermore, with enough pro¬ 
cessors (> 100), the order becomes PSO>GPS>CMA- 
ES. The order also changes depending on the number 
of processors for Case 1B~1D. Generally, A parallel en¬ 
vironment can greatly reduce the time spent for these 
algorithms. PSO can outperform GPS and GMA-ES in 
performance if the number of processors is large enough. 


6.3 Multiscale optimization for Model 1 

In this section we address the performance of the mul¬ 
tiscale approaches (M-GPS, M-PSO, and M-CMA-ES) 
for well control optimization. We use the first model as 
described in Section |5.2[ We stop the optimization at 
each scale when the average relative well rate change 
is less than 10% of the distance between the upper 
and lower bounds on the well rates. The scale will no 
longer be refined when the relative change in the NPV 
is < 10% between two neighboring scales. The maxi¬ 


mum number of simulation runs for the problem is set 
to 3200. As M-PSO and M-GMA-ES are stochastic al¬ 
gorithms, 10 trials were performed for these two algo¬ 
rithms to assess the overall performance. 


6.3.1 Performance of the multiscale approaches 


As in Section |5.H we consider four configurations for 
each multiscale approach. As a first test, the multiscale 
optimization process is terminated when the number of 
control steps reaches 8 each well for configuration I-III, 
and 16 each well for configuration IV. 

The plots of NPV versus the number of simula¬ 
tion runs for the different multiscale approaches and 
the different configurations, as well as the plots for the 
standard algorithms (GPS, PSO, GMA-ES) with 8 pre¬ 
set control steps for each well, are shown in Fig. 16 
From the figure we see that compared with direct op¬ 
timization with 8 well control adjustments, both GPS 
and PSO converge faster when using the multiscale ap¬ 
proach. GPS convergence improves the most amongst 
the three algorithms. Fig. |16(b)| showed that for this 
test case, M-PSO could locate a control strategy which 
gives a higher NPV than PSO. The performance of 


GMA-ES (Fig. 16(c) I is quite different. Results showed 


that GMA-ES converges faster than M-GMA-ES for 























18 


Xiang Wang et al. 


IZn GPS, 50% IZZI GPS, final 

IZZI PSO, 50% IZZI PSO, final 

IZZI CMA-ES, 50% IZZI CMA-ES, final 



IZZI GPS, 50% IZZI GPS, final 

IZZI PSO, 50% IZZI PSO, final 

IZZI CMA-ES, 50% IZZI CMA-ES, final 



Number of processors 


Number of processors 


(a) Case lA 


(b) Case IB 


IZZI GPS, 50% IZZI GPS, final 

IZZI PSO, 50% IZZI PSO, final 

IZZI CMA-ES, 50% IZZI CMA-ES, final 



IZZI GPS, 50% IZZI GPS, final 

IZZI PSO, 50% IZZI PSO, final 

IZZI CMA-ES, 50% IZZI CMA-ES, final 



Number of processors 


Number of processors 


(c) Case 1C 


(d) Case ID 


Fig. 15 Number of runs required for the well control optimization problems in parallel environments with different numbers 
of processors. “50%” in the legend denotes the number of runs required to reach 50% of the mcuximum NPV for the algorithms, 
“final” in the legend denotes the total number of runs required to reach the maximum number of simulator evaluation for the 
algorithms. 


this relatively small scale optimization problem. This 
is because CMA-ES is less sensitive to the quality of 
the initial guess and hence it can not take great ad¬ 
vantage of the multiscale framework to speed up its 
convergence rate. The multiscale framework for CMA- 
ES still does, however, give us a way to detect a good 
control frequency for well control optimization. 

As a second test of the multiscale framework we in¬ 
crease the number of control steps to 32 for each well. 
The plots of NPV versus the number of simulation runs 
for the different multiscale approaches and the different 
configurations, as well as the plots for the standard al¬ 


gorithms (GPS, PSO, CMA-ES) are shown in Eig. 17 


From the figure we see that compared with direct op¬ 
timization with 32 well control adjustments, all algo¬ 
rithms converge faster when using the multiscale ap¬ 


proach. GPS convergence improves the most amongst 
the three algorithms, followed by PSO and GMA-ES. 

6.3.2 Performance in parallel environments 

Fig. shows the performance of the multiscale ap¬ 
proaches in parallel environments with 8, 32, and an 
infinite number of processors using 8 pre-set final con¬ 
trol steps for configurations I-III and 16 control steps 
for configuration IV. As in Section |6.2.3[ we use the 
number of runs as the y-axis. The lines for PSO, CMA- 
ES, M-PSO, and M-CMA-ES are the medians of 10 tri¬ 
als. From this figure we can see that, not surprisingly, 
the more processors the less runs needed to converge. 
In a parallel environment, the improved convergence of 
the multiscale approach is less apparent. The multiscale 
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Runs 


Runs 


Runs 


(a) GPS, CPU=8 


(b) PSO, CPU=8 


(c) CMA-ES, CPU=8 




100 120 140 



(d) GPS, CPU=32 


(e) PSO, GPU=32 


(f) GMA-ES, CPU=32 





Runs 


Runs 


Runs 


(g) GPS, CPU=inf 


(h) PSO, CPU=inf 


(i) CMA-ES, CPU=inf 


Fig. 18 Comparison of the performance of multiscale approaches with 8 final control adjustments for each well in parallel 
environments with 8, 32, and an infinite number of processors. 


approach still benefits, however, if the optimal control 
frequency is unknown and hence not specified a priori. 

In Fig. we repeat this experiment for the more 
difficult problem with 32 control steps for each well. 
The results are similar. 

6.3.3 Performance with different computational 
budgets 

Here we assess the effect of computational budget on 
the efficacy of the multiscale approach. We use 300, 
1500, and 3000 simulation runs as a low, medium, and 
high budget. For different budgets, the performance of 
the different multiscale approaches and the different 
conhgurations are shown in several beanplots in Fig. |20[ 
In the beanplots, the individual observations are shown 
as small lines in a one-dimensional scatter plot. The es¬ 
timated density of the distributions is visible and the 
mean (blue bold line) and median (red are shown. 


Fig |20 (a)] shows results for a low budget. Since GPS 
is a deterministic algorithm, we have no distribution 
for the four configurations of M-GPS. M-GPS-II and 
M-GPS-III obtained the highest NPV amongst all four 
conhgurations. Although some conhgurations of M-PSO 
and M-GMA-ES could obtain a relatively high NPV, 
there is also has a risk of obtaining a low NPV due to 
the high variability. For a medium budget. Fig |20(b)[ 
the variation of all four conhgurations of M-CMA-ES 
is relatively small. The variation of the M-PSO is quite 
large. With this budget of function evaluations, M-GPS 
and M-GMA-ES are good choices. Eor a large budget, 
we see that M-GPS-IV obtained a higher NPV than the 
other conhgurations. With the fourth conhguration, the 
approach terminated at 16 control steps for each well, 
while the other three conhgurations terminated at 8 
control steps for each well. With more control steps, 
a higher NPV could be obtained. Conhguration I per¬ 
forms less well with a low budget. This is because the 
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(g) GPS, CPU=inf 


(h) PSO, CPU=inf 


(i) CMA-ES, CPU=inf 


Fig. 19 Comparison of the performance of multiscale approaches with 32 final control adjustments for each well in parallel 
environments with 8, 32, and an infinite number of processors. 


initial number of control steps for configuration I is 1. 
The optimal control found with only 1 step is very dif¬ 
ferent when compared with the optimal control found 
with more steps. In general, we see that the second con¬ 
figuration performs better than the other three config¬ 
urations for M-GPS, M-PSO and M-CMA-ES. Further¬ 
more, M-GPS-II is highly recommended for all budgets. 


6.4 Multiscale optimization for a real-world reservoir 


Based on the results of the previous section we now 
apply the multiscale approaches with configuration II 
(no = 2 and = 2) to solve the well control problem of 


reservoir PUNQ-S3 (see Section 5.2.2). The maximum 


number of function evaluations is set to 10000. We use 
an average relative well rate change of < 10% of the 
gap between the upper and lower bound as the stop¬ 
ping criterion for each scale. The scale will no longer be 


refined when the relative change in the NPV is < 10% 
between two neighboring scales. Due to the large com¬ 
putational time we perform only 3 trials for M-PSO and 
M-GMA-ES. 

Fig. shows the performance of M-GPS-II, M- 
PSO-II and M-CMA-ES-II. The performance of GPS, 
PSO, and CMA-ES using a pre-set control frequency 
of 32 control steps for each well is shown in this hgure 
as well. The results show that for the same number of 
reservoir simulations, combining these three algorithms 
with the multiscale framework gives higher NPV val¬ 
ues as compared to directly optimizing with the largest 
number of control steps. 

The values of NPV found by the approaches after 
1000, 5000, and 10000 simulation runs are shown in Ta- 
ble[^ From the table we can see that, M-GPS-11 outper¬ 
forms among all three multiscale approaches, even when 
the number of simulation runs is limited. Of course the 
deterministic M-GPS has the additional advantage of 
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(c) High budget, 3000 simulation runs 
Fig. 20 Beanplots for different configurations of our multiscale approaches. 


having no variability in the outcome. Without the mul¬ 
tiscale framework, CMA-ES performs the best. 


Table 7 Median NPV (10® USD) with different budgets for 
PUNQ-S3. 


Algorithm 

Trails 

1000 

5000 

10000 

GPS 

1 

2.9936 

3.0061 

3.0194 

M-GPS-II 

1 

3.0567 

3.0570 

3.0570 

PSO 

3 

3.0240 

3.0365 

3.0395 

M-PSO-II 

3 

3.0555 

3.0560 

3.0560 

GMA-ES 

3 

3.0384 

3.0558 

3.0572 

M-CMA-ES-II 

3 

3.0561 

3.0563 

3.0568 


7 Conclusions 

In this paper we have considered three derivative-free 
optimization algorithms combined with a multiscale frame¬ 
work to solve well control optimization problems. The 
optimization algorithms used include GPS, which is a 
deterministic local search approach; PSO, which is a 
stochastic global search method; and CMA-ES, which 
is a stochastic local search method. A generalization of 
the successive-splitting multiscale approach from [301 
138] was introduced to combine with the derivative-free 
optimization algorithms. 

Based on thorough numerical experiments the fol¬ 
lowing conclusions can be drawn: 

• The control frequency does have a significant effect 
on well control optimization problems. The more 
frequent the well control adjustment, the higher the 
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Fig. 10 Plots of NPV versus the number of simulation runs 
for GPS, PSO, and CMA-ES with different initial guesses 
with 8 control steps. Each line represents one trial. The color 
red, green, blue denotes the good, medium, bad initial guess, 
respectively. 


NPV that can be obtained but at the cost of a 
harder optimization problem. This increase becomes 
less significant as we continue to increase the num¬ 
ber of control steps. Considering the operation costs, 
each reservoir has a optimal control frequency. The 
optimal controls are similar with different control 
frequencies when every well is produced under a liq¬ 
uid rate throughout its lifetime. 


Q 

CO 

D 



500 1000 1500 2000 2500 3000 

Simulation runs 

(b) PSO 



Simulation runs 

(c) CMA-ES 


Fig. 16 Comparison of the performance of multiscale ap¬ 
proaches with different configurations with 8 final control ad¬ 
justments for each well. 


• Without the multiscale framework, GPS performs 
best when the problem dimension is very small and 
the budget is large enough. CMA-ES showed excel¬ 
lent performance when the budget is limited. A par¬ 
allel environment can greatly reduce the time spent 
for these algorithms. PSO can outperform GPS and 
CMA-ES in performance if the number of proces¬ 
sors is large enough. The choice of the initial guess 
has a significant effect on the convergence speed for 
GPS and PSO at the early stage of optimization. 
CMA-ES, by contrast, is less sensitive to the choice 
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Simulation runs 

(c) CMA-ES 

Fig. 17 Comparison of the performance of multiscale ap¬ 
proaches with different configurations with 32 final control 
adjustments for each well. 


of initial guess. The performance of PSO is affected 
dramatically by the population size. Parameter tun¬ 
ing for CMA-ES showed that the default settings 
work quite well. 

• The multiscale approaches have two advantages in 
solving well control problem. (1) they provide a way 
to optimize the control frequency and the well con¬ 
trols simultaneously. (2) when compared to the stan¬ 
dalone algorithms the multiscale approach can speed¬ 
up the convergence. Based on the results of the test 





Simulation runs 

(c) CMA-ES 

Fig. 21 Comparison of the performance of multiscale ap¬ 
proach and optimizers without multiscale for PUNQ-S3. Here 
are the median NPV of trials for PSO and CMA-ES. 


cases, the convergence of GPS and PSO improves 
the most when combined with the multiscale frame¬ 
work. The multiscale framework is more efficient as 
the number of control steps increases. The difference 
in performance between the multiscale hybrid algo¬ 
rithms and the stand-alone algorithms decreases as 
the number of processors increases. 

• The multiscale framework has two key parameters, 
the choice of the initial number of control steps no 
and the split factor n^. The choice no = 2 and 
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Us = 2 gave the best performance. In the multi¬ 
scale framework, M-GPS-II is highly recommended 
for any computational budget. 

All above conclusions are based on the experiments 
in this paper. Although the multiscale approach has 
shown its potential to solve complex well control prob¬ 
lems, there are still many possible avenues for future 
work. Some potential areas of investigation are the use 
of other stochastic approaches such as mesh adaptive 
direct search or differential evolution to see if they offer 
any improvement in performance. There is also flexibil¬ 
ity to choose different Uq and Ug values for each well. 
The performance of multiscale approaches with non¬ 
linear constraints still needs additional study. As does 
the development of robust stopping criteria within the 
multiscale framework. 
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